Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Biomed Semantics ; 6: 3, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25973165

RESUMO

BACKGROUND: Linked Data has gained some attention recently in the life sciences as an effective way to provide and share data. As a part of the Semantic Web, data are linked so that a person or machine can explore the web of data. Resource Description Framework (RDF) is the standard means of implementing Linked Data. In the process of generating RDF data, not only are data simply linked to one another, the links themselves are characterized by ontologies, thereby allowing the types of links to be distinguished. Although there is a high labor cost to define an ontology for data providers, the merit lies in the higher level of interoperability with data analysis and visualization software. This increase in interoperability facilitates the multi-faceted retrieval of data, and the appropriate data can be quickly extracted and visualized. Such retrieval is usually performed using the SPARQL (SPARQL Protocol and RDF Query Language) query language, which is used to query RDF data stores. For the database provider, such interoperability will surely lead to an increase in the number of users. RESULTS: This manuscript describes the experiences and discussions shared among participants of the week-long BioHackathon 2011 who went through the development of RDF representations of their own data and developed specific RDF and SPARQL use cases. Advice regarding considerations to take when developing RDF representations of their data are provided for bioinformaticians considering making data available and interoperable. CONCLUSIONS: Participants of the BioHackathon 2011 were able to produce RDF representations of their data and gain a better understanding of the requirements for producing such data in a period of just five days. We summarize the work accomplished with the hope that it will be useful for researchers involved in developing laboratory databases or data analysis, and those who are considering such technologies as RDF and Linked Data.

2.
Nucleic Acids Res ; 42(Database issue): D666-70, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24275496

RESUMO

To understand newly sequenced genomes of closely related species, comprehensively curated reference genome databases are becoming increasingly important. We have extended CyanoBase (http://genome.microbedb.jp/cyanobase), a genome database for cyanobacteria, and newly developed RhizoBase (http://genome.microbedb.jp/rhizobase), a genome database for rhizobia, nitrogen-fixing bacteria associated with leguminous plants. Both databases focus on the representation and reusability of reference genome annotations, which are continuously updated by manual curation. Domain experts have extracted names, products and functions of each gene reported in the literature. To ensure effectiveness of this procedure, we developed the TogoAnnotation system offering a web-based user interface and a uniform storage of annotations for the curators of the CyanoBase and RhizoBase databases. The number of references investigated for CyanoBase increased from 2260 in our previous report to 5285, and for RhizoBase, we perused 1216 references. The results of these intensive annotations are displayed on the GeneView pages of each database. Advanced users can also retrieve this information through the representational state transfer-based web application programming interface in an automated manner.


Assuntos
Alphaproteobacteria/genética , Cianobactérias/genética , Bases de Dados Genéticas , Genoma Bacteriano , Bradyrhizobium/genética , Genes Bacterianos , Internet , Mesorhizobium/genética , Anotação de Sequência Molecular , Rhizobium/genética , Sinorhizobium/genética
3.
J Biomed Semantics ; 4(1): 6, 2013 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-23398680

RESUMO

BACKGROUND: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS: The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.

4.
Microbes Environ ; 27(3): 306-15, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22452844

RESUMO

Bradyrhizobium sp. S23321 is an oligotrophic bacterium isolated from paddy field soil. Although S23321 is phylogenetically close to Bradyrhizobium japonicum USDA110, a legume symbiont, it is unable to induce root nodules in siratro, a legume often used for testing Nod factor-dependent nodulation. The genome of S23321 is a single circular chromosome, 7,231,841 bp in length, with an average GC content of 64.3%. The genome contains 6,898 potential protein-encoding genes, one set of rRNA genes, and 45 tRNA genes. Comparison of the genome structure between S23321 and USDA110 showed strong colinearity; however, the symbiosis islands present in USDA110 were absent in S23321, whose genome lacked a chaperonin gene cluster (groELS3) for symbiosis regulation found in USDA110. A comparison of sequences around the tRNA-Val gene strongly suggested that S23321 contains an ancestral-type genome that precedes the acquisition of a symbiosis island by horizontal gene transfer. Although S23321 contains a nif (nitrogen fixation) gene cluster, the organization, homology, and phylogeny of the genes in this cluster were more similar to those of photosynthetic bradyrhizobia ORS278 and BTAi1 than to those on the symbiosis island of USDA110. In addition, we found genes encoding a complete photosynthetic system, many ABC transporters for amino acids and oligopeptides, two types (polar and lateral) of flagella, multiple respiratory chains, and a system for lignin monomer catabolism in the S23321 genome. These features suggest that S23321 is able to adapt to a wide range of environments, probably including low-nutrient conditions, with multiple survival strategies in soil and rhizosphere.


Assuntos
Bradyrhizobium/genética , DNA Bacteriano/química , DNA Bacteriano/genética , Genoma Bacteriano , Análise de Sequência de DNA , Proteínas de Bactérias/genética , Composição de Bases , Bradyrhizobium/isolamento & purificação , Bradyrhizobium/fisiologia , Redes e Vias Metabólicas/genética , Dados de Sequência Molecular , Fases de Leitura Aberta , RNA não Traduzido/genética , Microbiologia do Solo , Simbiose , Sintenia
5.
J Biomed Semantics ; 2: 4, 2011 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-21806842

RESUMO

BACKGROUND: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. RESULTS: Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. CONCLUSIONS: Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.

6.
J Biomed Semantics ; 1(1): 8, 2010 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-20727200

RESUMO

Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.

7.
Bioinformatics ; 26(20): 2617-9, 2010 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-20739307

RESUMO

SUMMARY: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. AVAILABILITY: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. CONTACT: katayama@bioruby.org


Assuntos
Linguagens de Programação , Software , Biologia Computacional , Bases de Dados Factuais , MEDLINE , Filogenia , Análise de Sequência de Proteína
8.
Nucleic Acids Res ; 38(Web Server issue): W706-11, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20472643

RESUMO

Web services have become widely used in bioinformatics analysis, but there exist incompatibilities in interfaces and data types, which prevent users from making full use of a combination of these services. Therefore, we have developed the TogoWS service to provide an integrated interface with advanced features. In the TogoWS REST (REpresentative State Transfer) API (application programming interface), we introduce a unified access method for major database resources through intuitive URIs that can be used to search, retrieve, parse and convert the database entries. The TogoWS SOAP API resolves compatibility issues found on the server and client-side SOAP implementations. The TogoWS service is freely available at: http://togows.dbcls.jp/.


Assuntos
Biologia Computacional , Bases de Dados Factuais , Software , Internet , Integração de Sistemas , Interface Usuário-Computador
9.
Nucleic Acids Res ; 38(Database issue): D379-81, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19880388

RESUMO

CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Synechocystis/genética , Acesso à Informação , Biologia Computacional/tendências , Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Internet , Fases de Leitura Aberta , Estrutura Terciária de Proteína , Software
10.
Nucleic Acids Res ; 36(20): 6386-95, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18838389

RESUMO

Using full-length cDNA sequences, we compared alternative splicing (AS) in humans and mice. The alignment of the human and mouse genomes showed that 86% of 199 426 total exons in human AS variants were conserved in the mouse genome. Of the 20 392 total human AS variants, however, 59% consisted of all conserved exons. Comparing AS patterns between human and mouse transcripts revealed that only 431 transcripts from 189 loci were perfectly conserved AS variants. To exclude the possibility that the full-length human cDNAs used in the present study, especially those with retained introns, were cloning artefacts or prematurely spliced transcripts, we experimentally validated 34 such cases. Our results indicate that even retained-intron type transcripts are typically expressed in a highly controlled manner and interact with translating ribosomes. We found non-conserved AS exons to be predominantly outside the coding sequences (CDSs). This suggests that non-conserved exons in the CDSs of transcripts cause functional constraint. These findings should enhance our understanding of the relationship between AS and species specificity of human genes.


Assuntos
Processamento Alternativo , DNA Complementar/química , Evolução Molecular , Aminoacil-tRNA Sintetases/genética , Animais , Sequência de Bases , Sequência Conservada , Interpretação Estatística de Dados , Éxons , Genômica , Humanos , Íntrons , Camundongos , Fosfatidilinositol 3-Quinases/genética , RNA Mensageiro/química , Especificidade da Espécie
11.
DNA Res ; 15(4): 227-39, 2008 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-18511435

RESUMO

The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10,951 complete and 19,848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes.


Assuntos
Genoma de Planta , Lotus/genética , Mapeamento Cromossômico , DNA de Plantas , Duplicação Gênica , Genes de Plantas , Hibridização in Situ Fluorescente , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA , Sintenia
12.
Nucleic Acids Res ; 35(Database issue): D104-9, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17130147

RESUMO

The Human-transcriptome DataBase for Alternative Splicing (H-DBAS) is a specialized database of alternatively spliced human transcripts. In this database, each of the alternative splicing (AS) variants corresponds to a completely sequenced and carefully annotated human full-length cDNA, one of those collected for the H-Invitational human-transcriptome annotation meeting. H-DBAS contains 38,664 representative alternative splicing variants (RASVs) in 11,744 loci, in total. The data is retrievable by various features of AS, which were annotated according to manual annotations, such as by patterns of ASs, consequently invoked alternations in the encoded amino acids and affected protein motifs, GO terms, predicted subcellular localization signals and transmembrane domains. The database also records recently identified very complex patterns of AS, in which two distinct genes seemed to be bridged, nested or degenerated (multiple CDS): in all three cases, completely unrelated proteins are encoded by a single locus. By using AS Viewer, each AS event can be analyzed in the context of full-length cDNAs, enabling the user's empirical understanding of the relation between AS event and the consequent alternations in the encoded amino acid sequences together with various kinds of affected protein motifs. H-DBAS is accessible at http://jbirc.jbic.or.jp/h-dbas/.


Assuntos
Processamento Alternativo , DNA Complementar/química , Bases de Dados de Ácidos Nucleicos , Gráficos por Computador , Humanos , Internet , Proteínas/química , Proteínas/genética , Análise de Sequência de DNA , Interface Usuário-Computador
13.
Nucleic Acids Res ; 34(14): 3917-28, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16914452

RESUMO

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56,419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37,670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.


Assuntos
Processamento Alternativo , DNA Complementar/química , Genoma Humano , Proteínas/genética , RNA Mensageiro/química , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência de Bases , Biologia Computacional/métodos , Éxons , Variação Genética , Genômica/métodos , Humanos , Proteínas/química , Proteínas/fisiologia , RNA Mensageiro/metabolismo , Análise de Sequência de DNA
14.
Nucleic Acids Res ; 33(8): 2355-63, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15860772

RESUMO

We investigated human alternative protein isoforms of >2600 genes based on full-length cDNA clones and SwissProt. We classified the isoforms and examined their co-occurrence for each gene. Further, we investigated potential relationships between these changes and differential subcellular localization. The two most abundant patterns were the one with different C-terminal regions and the one with an internal insertion, which together account for 43% of the total. Although changes of the N-terminal region are less common than those of the C-terminal region, extension of the C-terminal region is much less common than that of the N-terminal region, probably because of the difficulty of removing stop codons in one isoform. We also found that there are some frequently used combinations of co-occurrence in alternative isoforms. We interpret this as evidence that there is some structural relationship which produces a repertoire of isoformal patterns. Finally, many terminal changes are predicted to cause differential subcellular localization, especially in targeting either peroxisomes or mitochondria. Our study sheds new light on the enrichment of the human proteome through alternative splicing and related events. Our database of alternative protein isoforms is available through the internet.


Assuntos
Isoformas de Proteínas/química , Isoformas de Proteínas/classificação , Processamento Alternativo , Bases de Dados de Proteínas , Humanos , Isoformas de Proteínas/análise , Isoformas de Proteínas/genética , Sinais Direcionadores de Proteínas , Proteômica , Análise de Sequência de Proteína
15.
PLoS Biol ; 2(6): e162, 2004 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15103394

RESUMO

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.


Assuntos
Biologia Computacional/métodos , DNA Complementar/genética , Bases de Dados Genéticas , Genes/fisiologia , Genoma Humano , Processamento Alternativo/genética , Genes/genética , Humanos , Internet , Repetições de Microssatélites/genética , Fases de Leitura Aberta/genética , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Estrutura Terciária de Proteína
16.
Nucleic Acids Res ; 32(Database issue): D75-7, 2004 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-14681362

RESUMO

DBTBS (http://dbtbs.hgc.jp) was originally released in 1999 as a reference database of published transcriptional regulation events in Bacillus subtilis, one of the best studied bacteria. It is essentially a compilation of transcription factors with their regulated genes as well as their recognition sequences, which were experimentally characterized and reported in the literature. Here we report its major update, which contains information on 114 transcription factors, including sigma factors, and 633 promoters of 525 genes. The number of references cited in the database has increased from 291 to 378. It also supports a function to find putative transcription factor binding sites within input sequences by using our collection of weight matrices and consensus patterns. Furthermore, though preliminarily, DBTBS now aims to contribute to comparative genomics by showing the presence or absence of potentially orthologous transcription factors and their corresponding cis-elements on the promoters of their potentially orthologously regulated genes in 50 eubacterial genomes.


Assuntos
Bacillus subtilis/genética , Bases de Dados de Ácidos Nucleicos , Regulação Bacteriana da Expressão Gênica , Regiões Promotoras Genéticas/genética , Transcrição Gênica/genética , Sequência de Bases , Sequência Conservada , Genômica , Filogenia , Elementos de Resposta/genética , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...